A database is an integrated collection of logically related records or files consolidated into a common pool that provides data for one or multiple uses  In one view  databases can be classified according to types of content: bibliographic  full-text  numeric  and image 
The data in a database is organized according to a database model  The model most commonly used today is the relational model  Other models such as the hierarchical model and the network model use a more explicit representation of relationships 
Contents  hide 
1 Architecture
2 Database management systems
2 1 Five components of DBMS
2 2 Primary tasks of DBMS packages
3 Types
3 1 Operational database
3 2 Analytical database
3 3 Data warehouse
3 4 Distributed database
3 5 End-user database
3 6 External database
3 7 Hypermedia databases on the web
3 7 1 Navigational database
3 8 In-memory databases
3 9 Document-oriented databases
3 10 Real-time databases
4 Models
4 1 Post-relational database models
4 2 Object database models
5 Storage structures
6 Indexing
7 Transactions and concurrency
8 Replication
9 Security
10 Locking
11 Applications
12 StatBank
13 See also
14 References
15 Further reading
16 External links
 Architecture

There are a number of database architectures in use  Many databases use a combination of strategies 
On-line Transaction Processing systems (OLTP) often use a  row-oriented  or an  object-oriented  datastore architecture  whereas data-warehouse and other retrieval-focused applications like Google's BigTable  or bibliographic database (library catalogue) systems may use a Column-oriented DBMS architecture 
Document-Oriented  XML  knowledgebases  as well as frame databases and RDF-stores (also known as triple-stores)  may also use a combination of these architectures in their implementation 
Not all databases have or need a database schema ( schema-less databases ) 
Over many years the database industry has been dominated by general purpose database systems  which offer a wide range of functions that are applicable to many  if not most circumstances in modern data processing  These have been enhanced with extensible datatypes  pioneered in the PostgreSQL project  to allow development of a very wide range of applications 
There are also other types of databases which cannot be classified as relational databases  Most notable is the object database management system  which stores language objects natively without using a separate data definition language and without translating into a separate storage schema  Unlike relational systems  these object databases store the relationship between complex data types as part of their storage model in a way that does not require runtime calculation of related data using relational algebra execution algorithms 
 Database management systems

Main article: Database management system
A database management system (DBMS) is software that organizes the storage of data  It controls the creation  maintenance  and use of the database storage structures of an organization and its end users  It allows organizations to place control of organizationwide database development in the hands of Database Administrators (DBAs) and other specialist  In large systems  a DBMS allows users and other software to store and retrieve data in a structured way 
Database management systems are categorized according to the database model that they support  such as the network  relational or object model  The model tends to determine the query languages that are available to access the database  One commonly used query language for the relational database is SQL  although SQL syntax and function can vary from one DBMS to another  A common query language for the object database is OQL  although it is not implemented by all vendors of object databases  A great deal of the internal engineering of a DBMS is independent of the data model  and is concerned with managing factors such as performance  concurrency  integrity  and recovery from hardware failures  In these areas there are large differences between products 
A relational database management system (RDBMS) implements the features of the relational model  In this context  Date's  Information Principle  states:  the entire information content of the database is represented in one and only one way  Namely as explicit values in column positions (attributes) and rows in relations (tuples)  Therefore  there are no explicit pointers between related tables   This contrasts with the object database management system (ODBMS) which does store explicit pointers between related types 
 Five components of DBMS
According to the wikibooks open-content textbooks   Design of Main Memory Database System/Overview of DBMS  Most of the DBMS present today are relational DBMS  Other less-used DBMS systems  such as the object DBMS  are generally used in areas of application-specific data management where performance and scalability take higher priority than the flexibility of ad hoc query capabilities provided via the relational algebra execution algorithms of a relational DBMS 
RDBMS has five main components:
Interface drivers - A user or application program shall initiate either schema modification or content modification  These drivers are built on top of SQL  They provide methods to prepare statements  execute statements  fetch results  etc  Examples include DDL  DCL  DML  ODBC  and JDBC  Some vendors provide language-specific proprietary interfaces  For example MySQL provides drivers for PHP  Python  etc 
SQL Engine - This component is responsible for interpreting and executing the SQL query  It comprises three major components 
Transaction Engine - Transactions are sequences of operations that read or write database elements  which are grouped together 
Relational Engine - Relational objects such as Table  Index  and Referential integrity constraints are implemented in this component 
Storage Engine - This component stores and retrieves data records  It also provides a mechanism to store metadata and control information such as undo logs  redo logs  lock tables  etc 
ODBMS has four main components 
Language Drivers - A user or application program shall initiate either schema modification or content modification via the chosen programming language  The drivers then provide the mechanism to manage object lifecycle coupling of the application memory space with the underlying persistent storage  Examples include C++  Java   NET  and Ruby 
Query Engine - This component is responsible for interpreting and executing language-specific query commands in the form of OQL  LINQ  JDOQL  JPAQL  others  The query engine returns language specific collections of objects which satisfy a query predicate expressed as logical operators e g  >  <  >=  <=  AND  OR  NOT  GroupBY  etc 
Transaction Engine - Transactions are sequences of operations that read or write database elements  which are grouped together  The transaction engine is concerned with such things as data isolation and consistency in the driver cache and data volumes by coordinating with the storage engine 
Storage Engine - This component is stores and retrieves objects in an abritratily complex model  It also provides a mechanism to manage and store metadata and control information such as undo logs  redo logs  lock graphs  etc 
 Primary tasks of DBMS packages
Database Development  It is used to define and organize the content  relationships  and structure of the data needed to build a database 
Database Interrogation  It can access the data in a database for information retrieval and report generation  End users can selectively retrieve and display information and produce printed reports and documents 
Database Maintenance  It is used to add  delete  update  correct  and protect the data in a database 
Application Development  It is used to develop prototypes of data entry screens  queries  forms  reports  tables  and labels for a prototyped application  Or use 4GL or 4th Generation Language or application generator to develop program codes 
 Types

 Operational database
These databases store detailed data needed to support the operations of the entire organization  They are also called subject-area databases (SADB)  transaction databases  and production databases  These are all examples:
Customer databases
Personal databases
Inventory databases
 Analytical database
These databases stores data and information extracted from selected operational and external databases  They consist of summarized data and information most needed by an organizations manager and other end user  They may also be called multidimensional database  Management database  and Information database 
 Data warehouse
A data warehouse stores data from current and previous years that has been extracted from the various operational databases of an organization  It is the central source of data that has been screened  edited  standardized and integrated so that it can be used by managers and other end user professionals throughout an organization
 Distributed database
These are databases of local work groups and departments at regional offices  branch offices  manufacturing plants and other work sites  These databases can include segments of both common operational and common user databases  as well as data generated and used only at a users own site 
 End-user database
These databases consist of a variety of data files developed by end-users at their workstations  Examples of these are collection of documents in spreadsheets  word processing and even downloaded files 
 External database
These databases where access to external  privately owned online databases or data banks is available for a fee to end users and organizations from commercial services  Access to a wealth of information from external database is available for a fee from commercial online services and with or without charge from many sources in the internet 
 Hypermedia databases on the web
These are set of interconnected multimedia pages at a web-site  It consists of home page and other hyperlinked pages of multimedia or mixed media such as text  graphic  photographic images  video clips  audio etc 
 Navigational database
Navigational databases are characterized by the fact that objects in it are found primarily by following references from other objects  Traditionally navigational interfaces are procedural  though one could characterize some modern systems like XPath as being simultaneously navigational and declarative 
 In-memory databases
In-memory databases are database management systems that primarily rely on main memory for computer data storage  It is contrasted with database management systems which employ a disk storage mechanism  Main memory databases are faster than disk-optimized databases since the internal optimization algorithms are simpler and execute fewer CPU instructions  Accessing data in memory provides faster and more predictable performance than disk  In applications where response time is critical  such as telecommunications network equipment that operates 9-1-1 emergency systems  main memory databases are often used 
 Document-oriented databases
Document-oriented databases are computer programs designed for document-oriented applications  These systems may be implemented as a layer above a relational database or an object database  As opposed to relational databases  document-based databases do not store data in tables with uniform sized fields for each record  Instead  each record is stored as a document that has certain characteristics  Any number of fields of any length can be added to a document  Fields can also contain multiple pieces of data 
 Real-time databases
A real-time database is a processing system designed to handle workloads whose state is constantly changing  This differs from traditional databases containing persistent data  mostly unaffected by time  For example  a stock market changes very rapidly and is dynamic  Real-time processing means that a transaction is processed fast enough for the result to come back and be acted on right away  Real-time databases are useful for accounting  banking  law  medical records  multi-media  process control  reservation systems  and scientific data analysis  As computers increase in power and can store more data  they are integrating themselves into our society and are employed in many applications 
 Models

Main article: Database model
 Post-relational database models
Products offering a more general data model than the relational model are sometimes classified as post-relational  The data model in such products incorporates relations but is not constrained by the Information Principle clarification needed   which requires that all information is represented by data values in relations  original research  
Some of these extensions to the relational model actually integrate concepts from technologies that pre-date the relational model  For example  they allow representation of a directed graph with trees on the nodes 
Some products implementing such models have been built by extending relational database systems with non-relational features  Others  however  have arrived in much the same place by adding relational features to pre-relational systems  Paradoxically  this allows products that are historically pre-relational  such as PICK and MUMPS  to make a plausible claim to be post-relational in their current architecture 
 Object database models
In recent years  the object-oriented paradigm has been applied to database technology  creating a various kinds of new programming model known as object databases  These databases attempt to bring the database world and the application programming world closer together  in particular by ensuring that the database uses the same type system as the application program  This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects)  At the same time  object databases attempt to introduce the key ideas of object programming  such as encapsulation and polymorphism  into the world of databases 
A variety of these ways have been tried for storing objects in a database  Some products have approached the problem from the application programming end  by making the objects manipulated by the program persistent  This also typically requires the addition of some kind of query language  since conventional programming languages do not have the ability to find objects based on their information content  Others have attacked the problem from the database end  by defining an object-oriented data model for the database  and defining a database programming language that allows full programming capabilities as well as traditional query facilities 
 Storage structures

Main article: Database storage structures
	This section requires expansion 
Relational database tables/indexes are typically stored in memory or on hard disk in one of many forms  ordered/unordered flat files  ISAM  heaps  hash buckets or B+ trees  These have various advantages and disadvantages discussed further in the main article on this topic  The most commonly used are B+ trees and ISAM 
Object databases use a range of storage mechanisms  Some use virtual memory mapped files to make the native language (C++  Java etc ) objects persistent  This can be highly efficient but it can make multi-language access more difficult  Others break the objects down into fixed and varying length components that are then clustered tightly together in fixed sized blocks on disk and reassembled into the appropriate format either for the client or in the client address space  Another popular technique is to store the objects in tuples  much like a relational database  which the database server then reassembles for the client 
Other important design choices relate to the clustering of data by category (such as grouping data by month  or location)  creating pre-computed views known as materialized views  partitioning data by range or hash  Memory management and storage topology can be important design choices for database designers as well  Just as normalization is used to reduce storage requirements and improve the extensibility of the database  conversely denormalization is often used to reduce join complexity and reduce execution time for queries  
 Indexing

All of these databases can take advantage of indexing to increase their speed  This technology has advanced tremendously since its early uses in the 1960s and 1970s  The most common kind of index is a sorted list of the contents of some particular table column  with pointers to the row associated with the value  An index allows a set of table rows matching some criterion to be located quickly  Typically  indexes are also stored in the various forms of data-structure mentioned above (such as B-trees  hashes  and linked lists)  Usually  a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required 
Most relational DBMS's and some object DBMSs have the advantage that indexes can be created or dropped without changing existing applications making use of it  The database chooses between many different strategies based on which one it estimates will run the fastest  In other words  indexes are transparent to the application or end-user querying the database; while they affect performance  any SQL command will run with or without index to compute the result of an SQL statement  The RDBMS will produce a plan of how to execute the query  which is generated by analyzing the run times of the different algorithms and selecting the quickest  Some of the key algorithms that deal with joins are nested loop join  sort-merge join and hash join  Which of these is chosen depends on whether an index exists  what type it is  and its cardinality 
An index speeds up access to data  but it has disadvantages as well  First  every index increases the amount of storage on the hard drive necessary for the database file  and second  the index must be updated each time the data are altered  and this costs time  (Thus an index saves time in the reading of data  but it costs time in entering and altering data  It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency )
A special case of an index is a primary index  or primary key  which is distinguished in that the primary index must ensure a unique reference to a record  Often  for this purpose one simply uses a running index number (ID number)  Primary indexes play a significant role in relational databases  and they can speed up access to data considerably 
 Transactions and concurrency

In addition to their data model  most practical databases ( transactional databases ) attempt to enforce a database transaction  Ideally  the database software should enforce the ACID rules  summarized here:
Atomicity: Either all the tasks in a transaction must be done  or none of them  The transaction must be completed  or else it must be undone (rolled back) 
Consistency: Every transaction must preserve the integrity constraints  the declared consistency rules  of the database  It cannot place the data in a contradictory state 
Isolation: Two simultaneous transactions cannot interfere with one another  Intermediate results within a transaction are not visible to other transactions 
Durability: Completed transactions cannot be aborted later or their results discarded  They must persist through (for instance) restarts of the DBMS after crashes
In practice  many DBMSs allow most of these rules to be selectively relaxed for better performance 
Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules  The DBMS must be able to ensure that only serializable  recoverable schedules are allowed  and that no actions of committed transactions are lost while undoing aborted transactions 
 Replication

Replication of databases is closely related to transactions  If a database can log its individual actions  it is possible to create a duplicate of the data in real time  The duplicate can be used to improve performance or availability of the whole database system  Common replication concepts include:
Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
Quorum: The result of Read and Write requests are calculated by querying a  majority  of replicas 
Multimaster: Two or more replicas sync each other via a transaction identifier 
Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously  which provides a method for backup and security as well as data availability  This is also referred to as database clustering 
 Security

Database security denotes the system  processes  and procedures that protect a database from unintended activity 
Security is usually enforced through access control  auditing  and encryption 
Access control ensures and restricts who can connect and what can be done to the database 
Auditing logs what action or change has been performed  when and by whom 
Encryption: Since security has become a major issue in recent years  many commercial database vendors provide built-in encryption mechanisms  Data is encoded natively into the tables and deciphered  on the fly  when a query comes in  Connections can also be secured and encrypted if required using DSA  MD5  SSL or legacy encryption standard 
Enforcing security is one of the major tasks of the DBA 
In the United Kingdom  legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner  United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner  
 Locking

	This section requires expansion 
Locking is how the database handles multiple concurrent operations  This is how concurrency and some form of basic integrity is managed within the database system  Such locks can be applied on a row level  or on other levels like page (a basic data block)  extent (multiple array of pages) or even an entire table  This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data 
In basic filesystem files or folders  only one lock at a time can be set  restricting the usage to one process only  Databases  on the other hand  can set and hold mutiple locks at the same time on the different level of the physical data structure  How locks are set  last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users  Generally speaking  no activity on the database should be translated by no or very light locking 
For most DBMS systems existing on the market  locks are generally shared or exclusive  Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts  Exclusive locks are usually set while the database needs to change data  like during an UPDATE or DELETE operation 
Shared locks can take ownership one from the other of the current data structure  Shared locks are usually used while the database is reading data  during a SELECT operation  The number  nature of locks and time the lock holds a data block can have a huge impact on the database performances  Bad locking can lead to disastrous performance response (usually the result of poor SQL requests  or inadequate database physical structure)
Default locking behavior is enforced by the isolation level of the data server  Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system  Default isolation is generally 1  where data can not be read while it is modified  forbidding to return  ghost data  to end user 
At some point intensive or inappropriate exclusive locking  can lead to the  dead lock  situation between two locks  Where none of the locks can be released because they try to acquire resources mutually from each other  The Database has a fail safe mechanism and will automatically  sacrifice  one of the locks releasing the resource  Doing so processes or transactions involved in the  dead lock  will be rolled back 
Databases can also be locked for other reasons  like access restrictions for given levels of user  Some databases are also locked for routine database maintenance  which prevents changes being made during the maintenance  See  Locking tables and databases  (section in some documentation / explanation from IBM) for more detail ) However  many modern databases don't lock the database during routine maintenance  e g   Routine Database Maintenance  for PostgreSQL 
 Applications

Databases are used in many applications  spanning virtually the entire range of computer software  Databases are the preferred method of storage for large multiuser applications  where coordination between many users is needed  Even individual users find them convenient  and many electronic mail programs and personal organizers are based on standard database technology  Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database  Two commonly used database APIs are JDBC and ODBC 
 StatBank

Main article: StatBank
The largest statistical database maintained by the central authority of statistics in Denmark is called StatBank  The very large database in English is available free-of-charge for all users on the internet  It is updated every day 9 30 am (CET) and contains all new statistics in a very detailed form  The statistics can be presented as cross-tables  diagrams or maps  There are about 2 million hits every year (2006)  The output can be transferred to other programs for further compilation 
 See also

Comparison of relational database management systems
Comparison of database tools
Data hierarchy
Database theory
Database-centric architecture
Document-oriented database
Government database
In-memory database
Object database
Online database
Real time database
Relational database